-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for external allocators: #748
Support for external allocators: #748
Conversation
I cannot tell you how much better everything works now in my engine. Memory fragmentation is waaaay down and it is harder to run out of memory with texture stream-in/out events. If you need me to clean it up and de-duplicate code: with pleasure. But I cannot stress how much more efficient this is as opposed to constant VkDeviceMemory creation/deletions. Not to mention, this should also save you from the 4K limit on Windows drivers for live VkDeviceMemory objects. Additionally, the fragmentation this was causing when coupled with too much content GPU-side (i.e. due to too many large textures filling the VRAM close to capacity) would result in DEVICE_LOST events as opposed to an OOM error. The driver would reset. My YouTube videos would stop playing. Now I properly get a VK_ERROR_OUT_OF_DEVICE_MEMORY which can be gracefully handled. I develop on a 1050Ti btw which is my minspec. |
Thank you for this. Does it resolve the remaining parts of #567? Please de-duplicate the code. I suggest calling the new function from the existing I need code to test this. At a minimum please add a new sample to |
Ping @toomuchvoltage. |
Hi @MarkCallow , apologies for the delay, have been very busy. I will get back to you shortly with some updates. |
170ab4d
to
398d489
Compare
Hi @MarkCallow , so the most recent force-push basically addresses all of #567 . Speaking of VMA, it exposes these more granular calls for advanced usage, all of which perform mutually exclusive
I will gladly proceed to add a test, but just make sure: would it be OK for the test to have a dependency on VMA? There is no better way to genuinely test without making that explicit functional dependency. Also @MarkCallow please let me know, if the specifications provided are in line with our formatting for doxygen. |
Hi @MarkCallow , I just made another improvement to reduce the API's surface area. All suballocator callbacks are now packed into a single struct. An optional pointer to that struct can be passed reducing
|
Actually, @MarkCallow let me know if the signature changes to |
I like having the sub-allocation functions in a structure and the typedefs. The signature changes to The new tests having a dependency on VMA is fine provided that only those tests have the dependency and the rest of the KTX-Software project can be built without it. How big is VMA? Is it small enough to include in the KTX-Software repo or should we require people to download it? |
8d8519d
to
66a2f6c
Compare
Hi @MarkCallow , just made sure that the old function signatures remain as is. Would appreciate a quick check to ensure that my specifications are also in line with our doxygen rules. I also managed to make yet another improvement as well: I ensured that the interface can take in (hopefully thread-safe) suballocator callbacks that have the potential to do sparse bindings. Resultingly, the suballocator callbacks no longer return a I also made accesses to the suballocator directory in my examples thread-safe as well. Here are my handrolled thread-safe suballocator callbacks that have potential sparse bindings support: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L288-L411 |
Oh I forgot to mention, VMA is a single header library... but it is written in C++ while providing a C interface as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall. I spotted some documentation things I'd like fixed and have a couple of questions about error handling.
You need to fix the code warnings that are causing CI builds to fail. (CI builds are done with warnings as errors set.)
Dear @MarkCallow , I think I pretty much addressed everything. Regarding checking the completeness of suballocator callbacks in If everything looks good at this stage, let me know and I will provide the VMA tests shortly. Much appreciated! |
The failure in some of the Windows builds is due to a clang update, with a new warning, in the GitHub CI Windows runners. I plan to fix the code later today. After the fix you will need to rebase on main. |
@MarkCallow Sounds good. Just pushed changes addressing the most recent feedbacks. If all looks good, I'll get the tests ready. Also would appreciate a ping here once your changes on main are complete! 🙏 |
All looks good. I'll ping you when the build issue is fixed. I'm suffering from a very very slow computer I have to use for testing fixes to the issue. |
The fix for clang 16 is now in |
Thanks @MarkCallow . I will post some updates soon. I'm tied up a bit on my end at the moment. |
Hi @MarkCallow , I just added a new test under |
01ab174
to
e509fc9
Compare
Hi @MarkCallow , all done! These functions signatures are general... and the current implementations apply to everything except sparse bindings. However, please note that I had to move I need to amend: using the callbacks more than once (i.e. for different textures and so on), wouldn't really increase coverage. VMA will just keep getting a VkDeviceMemory with an offset. Another test could be added later that could test sparse bindings (once we add support)... but that would have very explicit scenarios set up. For example:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name change is fine.
Thinking to also move the useSubAllocator bool to VulkanLoadTestSample but ideally then VulkanLoadTestSample would handle parsing the command line for the matching option. IIRC there is no hierarchical processing of the command line. So we'll leave this idea for later.
Please look into the build errors on Windows. They look like something to do with the way you are using VMA.
Hi @MarkCallow , I need your help on this one. I've been incapable of replicating it on my side. My usage is perfectly in line with what VMA wants: just 1 include in a CPP file. It seems like it's a bunch of warnings about VMA itself? I thought dropping the file in |
Possibly msvc has no
and
around the include of VMA. |
Hi @MarkCallow , can you give this another spin? Hopefully it's fixed. |
Dear @MarkCallow , I also attempted to address some warnings highlighted here in the last commit. If you could review and let me know whether the changes actually address these, it would be greatly appreciated! 🙏 |
04783ad
to
74e7118
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the 2 changes indicated.
You need to also suppress warning 4189 which I missed in the blizzard of 4100 warnings in the CI log and I transposed digits in one of the other warnings. It should be 4324 not 4234. Sorry about that. As far as I can see, you have fixed fix the undocumented |
74e7118
to
5eaa1e1
Compare
All done! I guess time for another go... 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'll try the build again.
Thank you very much for your hard work on this. |
Did you run the --use-vma test under the Vulkan
|
Hi @MarkCallow , I've identified and fixed the issue. It had to do with my memory property flag detection being tied to a specific vendor (nVidia). I've fixed the issue in my most recent remote branch. You can perhaps rip out the last commit and merge once more if that works. If not, I can open a new PR. |
Please open a new PR. Sorry about the bum steer on the |
No, my bad. That was a really good find. I switched cards to my Radeon VII and the test crashed. |
* The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application. * All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex. * Each texture now also keeps track of its VkDeviceMemory offset. * The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings). * Add a VkLoadTest for suballocation callbacks
* The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application. * All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex. * Each texture now also keeps track of its VkDeviceMemory offset. * The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings). * Add a VkLoadTest for suballocation callbacks
* The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application. * All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex. * Each texture now also keeps track of its VkDeviceMemory offset. * The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings). * Add a VkLoadTest for suballocation callbacks
* The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application. * All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex. * Each texture now also keeps track of its VkDeviceMemory offset. * The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings). * Add a VkLoadTest for suballocation callbacks
* The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application. * All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex. * Each texture now also keeps track of its VkDeviceMemory offset. * The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings). * Add a VkLoadTest for suballocation callbacks
Support for external allocators:
Each texture now also keeps track of its VkDeviceMemory offset.Potential sparse bindings support removed this.